---
title: Lindera
description: Uses prebuilt dictionaries to tokenize Chinese, Japanese, and Korean text
canonical: https://docs.paradedb.com/documentation/tokenizers/available-tokenizers/lindera
---

The Lindera tokenizer is a more advanced CJK tokenizer that uses prebuilt Chinese, Japanese, or Korean dictionaries to break text into meaningful tokens (words or phrases) rather than on individual characters.
Chinese Lindera uses the CC-CEDICT dictionary, Korean Lindera uses the KoDic dictionary, and Japanese Lindera uses the IPADIC dictionary.

By default, non-CJK text is lowercased, but punctuation and whitespace are not ignored.

<CodeGroup>
```sql Chinese Lindera
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.lindera(chinese)))
WITH (key_field='id');
```

```sql Korean Lindera
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.lindera(korean)))
WITH (key_field='id');
```

```sql Japanese Lindera
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.lindera(japanese)))
WITH (key_field='id');
```

</CodeGroup>

To get a feel for this tokenizer, run the following command and replace the text with your own:

```sql
SELECT 'Hello world! 你好!'::pdb.lindera(chinese)::text[];
```

```ini Expected Response
              text
--------------------------------
 {hello," ",world,!," ",你好,!}
(1 row)
```
